Evaluating SPLASH-2 Applications Using MapReduce

نویسندگان

  • Shengkai Zhu
  • Zhiwei Xiao
  • Haibo Chen
  • Rong Chen
  • Weihua Zhang
  • Binyu Zang
چکیده

MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others. In this paper, we study the feasibility of running typical supercomputing applications using the MapReduce framework. We port two applications (Water Spatial and Radix Sort) from the Stanford SPLASH-2 suite to MapReduce. By completely evaluating them in Hadoop, an open-source MapReduce framework for clusters, we analyze the major performance bottleneck of them in the MapReduce framework. Based on this, we also provide several suggestions in enhancing the MapReduce framework to suite these applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Aligning Massive Time-Series Data in Splash

Important emerging sources of big data are large-scale predictive simulation models used in e-science and, increasingly, in guiding policy and investment decisions around highly complex issues such as population health and safety. The Splash project provides a platform for combining existing heterogeneous simulation models and datasets across a broad range of disciplines to capture the behavior...

متن کامل

Profiling and evaluating hardware choices for MapReduce environments: An application-aware approach

The core business of many companies depends on the timely analysis of large quantities of new data. MapReduce clusters that routinely process petabytes of data represent a new entity in the evolving landscape of clouds and data centers. During the lifetime of a data center, old hardware needs to be eventually replaced by new hardware. The hardware selection process needs to be driven by perform...

متن کامل

MRBS: A Comprehensive MapReduce Benchmark Suite

MapReduce is a promising programming model for distributed data processing. Extensive research has been conducted on the scalability of MapReduce, and several systems have been proposed in the literature, ranging from job scheduling to data placement and replication. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapRed...

متن کامل

Experiences on the Implementation of PARMACS Macros Using Different Multiprocessor Operating System Interfaces

In order to evaluate the goodness of parallel systems, it is necessary to know how parallel programs behave. The SPLASH-2 applications provide us with a realistic workload for such systems. So, we have made different implementations of the PARMACS macros used by SPLASH-2 applications, based on several execution and synchronization models, from classical Unix processes to multithreaded systems. ...

متن کامل

LNCS 7640 - Euro-Par 2012: Parallel Processing Workshops

MapReduce is a popular programming model for distributeddata processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance tonew fault-tolerance models. However, realistic benchmarks are still miss-ing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009